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Abstract 

Asymptotic behavior of the singular value decomposition (SVD) of blown up matri- 
ces and normalized blown up contingency tables exposed to Wigner-noise is investi- 
gated. It is proved that such an m x n matrix almost surely has a constant number 
of large singular values (of order \Jmri) , while the rest of the singular values are of 
order + n as m, n ^ oo. Concentration results of Alon at al. for the eigenvalues 
of large symmetric random matrices are adapted to the rectangular case, and on 
this basis, almost sure results for the singular values as well as for the corresponding 
isotropic subspaces are proved. An algorithm, applicable to two-way classification 
of microarrays, is also given that finds the underlying block structure. 
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1 Introduction 



A general problem of multivariate statistics is to find linear structures in large 
real- world data sets like internet or microarray measurements. In [5], large 
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symmetric blown up matrices burdened with a so-called symmetric Wigner- 
noise were investigated. It was proved that such an n x n matrix has some 
protruding eigenvalues (of order n), while the majority of the eigenvalues is 
at most of order ^/n with probability tending to 1 as n — > oo. Our goal is to 
generahze these results for the stabihty of SVD of large rectangular random 
matrices and to apply them for the contingency table matrix formed by cat- 
egorical variables in order to perform two-way clustering of these variables. 
First we introduce some notation. 

Definition 1 The m x n real matrix W is a Wigner- noise if its entries Wij 
{I < i < m, 1 < j < n) are independent random variables, E(wjj) = 0, and 
the Wij 's are uniformly bounded (i.e., there is a constant K > 0, independently 
of m and n, such that \wij\ < K, \/i,j). 

Though, the main results of this paper can be extended to Wij^s with any 
light-tail distribution (especially to Gaussian distributed Wi/s), our almost 
sure results will be based on the assumptions of Definition 1. 

Definition 2 The m x n real matrix B is a blown up matrix, if there is an 
axb so-called pattern matrix P with entries < pij < 1, and there are positive 

integers mi, . . . , rria with X^iLi rrii = m and ni, . . . , with X]i=i = such 
that the matrix B can be divided into axb blocks, where block {i,j) is an 
rrii X Uj matrix with entries equal to pij (1 < i < a, I < j < b). 

Such schemes are sought for in microarray analysis and they are called chess- 
board patterns, cf. [9]. Let us fix the matrix P, blow it up to obtain matrix B, 
and let A = B + W, where W is a Wigner-noise of appropriate size. We are 
interested in the properties of A when mi, ... , rua — > oo and rii, . . . , — > oo, 
roughly speaking, at the same rate. More precisely, we make two different 
constraints on the growth of the sizes m, n, and the growth rate of their 
components. The first one is needed for all our reasonings, while the second 
one will be used in the case of noisy correspondence matrices, only. 

Definition 3 

GCl (Growth Condition 1) 

There exists a constant < c < 1 such that mi/m > c (i — 1, . . . ,a) and 
there exists a constant < d < 1 such that ui/n > d (i — 1, . . . ,b). 

GC2 (Growth Condition 2) 

There exist constants C > 1, D > 1, and Cq > 0, > such that 
m < Co ■ and n < Dq ■ hold for sufficiently large m and n. 

Renicirk 4 GCl implies that 




- and d < — < - 
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hold for any pair of indices k,i & {1, . . . ,a} and £, j & {1, . . . , b}. 

We want to establish some property Vm,n that holds for the m x n random 
matrix A = B + W (briefly, Amxn) with m and n large enough. In this paper 
Vm,n is mostly related to the SVD of A^^xn- 

Definition 5 Property Vm,n holds for Amxn almost surely (with probability 
1) P (3 mo, no G N such that for m > and n > Uq A^xn has Vm,n) = 1- 
Here we may assume GCl or GC2 for the growth of m and n, while K is kept 
fixed. 

In combinatorics literature convergence in probability, that is 

hm P (A^xn has Vm,n) = 1 

m,n— »oo ' 

is frequently considered, and - by the Borel-Cantelli Lemma - it implies al- 
most sure convergence, if in addition Yf^=i Yf^=i Pmn < oo also holds, where 

Pmn = P (A^xn doCS UOt haVC Vm,n) ■ 



According to a generalization of a theorem of Fiiredi and Komlos [7] to rect- 
angular matrices, the spectral norm of an m x n Wigner- noise is + n in 
probability. More precisely, it was shown (see [1]) that with probability tend- 
ing to 1, ||W|| < |cri/m + n, where a is the common bound for the variances 
of the entries. Trivially, a < K that does not depend on m and n, hence 
||W|| = 0{\/m + n) in probability. Bounding the variances from below, au- 
thors also proved that ||W|| = Q{^/m + n) with high probability for large 
m, n. 

To prove almost sure convergence, a sharp concentration theorem of N. Alon 
at al. plays a crucial role (cf. [2]). For completeness we formulate this result. 

Lemma 6 Let W be a q x q real symmetric matrix whose entries in and 
above the main diagonal are independent random variables with absolute value 
at most 1. Let Xi > X2 > ■ ■ ■ > Xq be the eigenvalues 0/ W. The following es- 
timate holds for the deviation of the ith largest eigenvalue from its expectation 
with any positive real number t: 

P (I A, - E(A,) I >t)< exp (- when i < |, 

and the same estimate holds for the probability P (|Aq_j_|_i — E(Aq_j_|_i)| > t). 
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Now let W be a Wigner-noise with entries uniformly bounded by K. The 
(m + n) X (m + n) symmetric matrix 

~^ '\w^ j 

satisfies the conditions of Lemma 6, its largest and smallest eigenvalues are 

Ai(W) = -A„+„_i+i(W) = ■ Sj(W), i = 1, . . . ,min{m,n}, 

the others are zeros, where and Si{.) denote the ith largest eigenvalue and 
singular value of the matrix in the argument, respectively (cf. [3]). Therefore 

P(|.i(W) -E(.i(W))| >t)< exp (-^^^^) ■ (2) 

The fact that ||W|| = 0{\/'m + n) in probability and inequality (2) together 
ensure that E(||W||) = 0{-^m + n). Hence, no matter how E(||W||) behaves 
when m — > cxD and n — > cxo, the following rough estimate holds. 

Lemma 7 There exist positive constants Cki and Ck2, depending on the com- 
mon bound on the entries ofW, such that 

P ( ||W|| > Cki ■ V^^^T^) < exp[-Cx2 ■ (m + n)]. (3) 



The exponential decay of the right hand side of (3) implies that the spec- 
tral norm of a Wigner-noise W^xn is of order ^/m + n, almost surely. This 
observation will provide the base of almost sure results of Sections 2 and 3. 

In Section 2 we shall prove that the m x n noisy matrix A = B + W almost 
surely has r = rank (P) protruding singular values of order ^mn. In Section 3 
the distances of the corresponding isotropic subspaces are estimated and this 
gives rise to a two-way classification of the row and column items of A with 
sum of inner variances O(^^p), almost surely. 

In Definition 2 we required that the entries of the pattern matrix P be in 
the [0,1] interval. We made this restriction only for the sake of the gener- 
alized Erdos-Renyi hypergraph model with the entries of P as probabilities, 
see [6]. In fact, our results are valid for any pattern matrix with fixed sizes and 

with non-negative entries. For example, in microarray measurements the rows 
correspond to different genes, the columns correspond to different conditions, 
and the entries are the expression levels of a specific gene under a specific 
condition. 
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Sometimes the pattern matrix P is an a x 6 contingency table with entries that 
are nonnegative integers. Then the blown np matrix B can be regarded as a 
larger (m x n) contingency table that contains e.g., counts for two categorical 
variables with m and n different categories, respectively. As the categories may 
be measured in different units, a normalization is necessary. This normalization 
is made by dividing the entries of B by the square roots of the corresponding 
row and column sums (cf. [9]). This transformation is identical to that of the 
correspondence analysis [8], and the transformed matrix remains the same 
when we multiply the initial matrix by a positive constant. The transformed 
matrix ^corr, which belongs to B, has entries in [0,1] and maximum singular 
value 1. It is proved that there is a remarkable gap between the rank (B) = 
rank (P) largest and the other singular values of Acorr, the matrix obtained 
from the noisy matrix A = B+W by the correspondence transformation. This 
implies well two-way classification properties of the row and column categories 
(genes and expression levels) in Section 4. 

In Section 5 a construction is given how a blown up structure behind a real-life 
matrix with a few protruding singular values and 'well classifiable' correspond- 
ing singular vector pairs can be found. 



2 Singular values of a noisy matrix 

Proposition 8 // GCl holds, then all the non-zero singular values of the 
m X n blown-up matrix B are of order ^/mn. 



PROOF. As there are at most a and b linearly independent rows and linearly 
independent columns in B, respectively, the rank r of the matrix B cannot 
exceed min{a, 6}. Let Si > S2 > ■ ■ ■ > Sr > be the positive singular values 
of B. Let Vfc G M™, u^ G M" be a singular vector pair corresponding to Sk, 
k — 1, . . . ,r. Without loss of generahty, Vi, . . . , and Ui, . . . , u,. can be unit- 
norm, pairwise orthogonal vectors in R"* and R", respectively. 

For the subsequent calculations we drop the subscript k, and v, u denotes a 
singular vector pair corresponding to the singular value s > of the blown-up 
matrix B, ||v|| = ||u|| = 1. It is easy to see that they have piecewise constant 
structures: v has rrii coordinates equal to v{i) {i = l,...,a) and u has Uj 
coordinates equal to u{j) (j = l,...,b). Then, with these coordinates the 
singular value-singular vector equation 

Bu = s • V (4) 
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has the form 

b 

J2njPiju(j) ^ s-v(i) (i = l,...,a). (5) 

i=i 

With the notations 

u = {u{l), u{a)f , V = {v{l), v{b)f , 

D„ = diag (mi, ... , m^), D„ = diag (ni, ...,nb) 
the equations in (5) can be written as 

PD„u = s • V. 

Introducing the following transformations of u and v 

w = Dy^u, z = D^/^v, (6) 

the equation is equivalent to 

D^/^PDV V = . • z. (7) 

Applying the transformation (6) for the Ufc, pairs obtained from the u^, Vfc 
pairs {k = 1, . . . , r), orthogonormal systems in and are obtained: 

b a 

• W£ = ^ njUk{j)ui{j) = Ski and zj • z^ = ^ miVk{i)vt{i) = 5u. 

3=1 i=l 

Consequently, z^, is a singular vector pair corresponding to singular value 
Sk of the a xb matrix D^^PD^/^ (A; = 1, . . . , r). With the shrinking 



m n 
an equivalent form of (7) is 

that is the axb matrix Di/^PDy^ has non-zero singular values with the 

m n o y/mn 

same singular vector pairs z^, (A; = 1, . . . ,r). If the s^'s are not distinct 
numbers, the singular vector pairs corresponding to a multiple singular value 
are not jinique^but still they can be obtained from the SVD of the shrunken 
matrix DV2pDy2. 

Now we want to establish relations between the singular values of P and 
D^^PDy^. Let Sfc(Q) denote the kth largest singular value of a matrix Q. 
By the Courant-Fischer-Weyl minimax principle (cf. [3, p. 75]) 

Sjfc((4j = max mm 



dimif=fe xeH X 
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Since wc arc interested only in the first r singular values, where r = rank (B) = 
rank (DH'^P'DI/'^) , it is sufficient to consider vectors x, for which D^^PDy^x ^ 
0. Therefore with k e {l,...,r} and an arbitrary /c-dimensional subspace 
H gM!' one can write 



. i|DV2pDy2x|| . iiDV^PDy^xii iiPDy^xii iiDy^xi 

mm — = mm 



l|x|| xe/f llPDy'xIl llDr/' 



X 



X 



^^U2^ llPDy^xii ^ iiPDy^xi 



X 



with c, (i of GCl. Now taking the maximum for all possible /c-dimensional 
subspace H we obtain that Sfe(Dy^PDy^) > ^/cd ■ sa;(P) > 0. On the other 
hand, 

ski^lL'^W) < iiD^^PDyi < m^w ■ iiPii • iidv^ii < iiPii < ^^d>. 

These inequalities imply that Sfc(Dy^PDy^) is a nonzero constant, and be- 
cause of Sfc(Dy^PDy^) = -^j= we obtain that si, . . . , = Q{y/rrm). □ 

Theorem 9 Let A = B + W be an m x n random matrix, where B is a blown 
up m,atrix with positive singular values Si, . . . ,Sr and W is a Wigner-noise. 
Then, under GCl, the matrix A almost surely has r singular values zi, . . . , z^, 
such that 

\zi — Si\ = 0{\/m + n), i = l,...,r 
and for the other singular values almost surely 

Zj = 0{\/m + n), j = r + 1, . . . , min{m, n}. 



PROOF. The statement follows from the analog of the Weyl's perturbation 
theorem for singular values of rectangular matrices (see [3, p. 99]) and from 
Lemma 7. If Si{A) and Si(B) denote the ith largest singular values of the 
matrix in the argument then for the difference of the corresponding pairs 

|si(A) — Si(B)| < maxsi(W) = ||W||, i = 1, . . . , min{m, n}. 

i 

By Lemma 7, P {\s^{A) - Si{B)\ > Cki ■ Vm + n) < P ( ||W|| > Cki ■ VmTn) 
< exp[— Cii-2 ■ (m + n)]. The right hand side of the last inequality is the general 
term of a convergent series (defined as a double summation) , thus the conver- 
gence in probability implies the almost sure statement of the theorem. □ 

Corollary 10 With notations 
e :— ||W|| = 0{\Jm + n) and A := min Sj(B) = min Sj = ©(-\/mn) (8) 

l<i<r l<i<r 
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there is a spectral gap of size A — 2^ between the r largest and the other singular 
values of the perturbed matrix A, and this gap is significantly larger than e. 



3 Classification via singular vector pairs 

With the help of Theorem 9 we can estimate the distances between the cor- 
responding right- and left-hand side eigenspaces (isotropic subspaces) of the 
matrices B and A = B -|- W. Let vi, . . . , v^ e and ui, . . . , u„ e MT- be 
orthonormal left- and right-hand side singular vectors of B, 

Bui = Sj • Vj {i — l,...,r) and Buj = (j = r -|- 1, . . . , n). 

Let us also denote the unit-norm, pairwise orthogonal left- and right-hand side 
singular vectors corresponding to the r protruding singular values zi, . . . ,Zr 
of A by yi, . . . , Yr- G and Xi, . . . , Xj. e R", respectively. Then Axj — Zi-ji 
{i — 1, . . . ,r). Let 

F := Span {vi, . . . , v^} and G := Span {ui, . . . , u^} 

denote the spanned linear subspaces in R'" and M", respectively; further, let 
dist(y,F) denote the Euclidean distance between the vector y and the sub- 
space F. 

Proposition 11 With the above notation, under GCl, the following estimate 
holds almost surely: 

± Ms^y,, F) < r-^^ = O (^^) (9) 

and analogously, 

P„e^^„GJ<r^ = o(^). (10, 



PROOF. Let us choose one of the right-hand side singular vectors Xi, . . . , 
of A = B -|- W and denote it simply by x with corresponding singular value 
z. We shall estimate the distance between x and G, similarly between y = 
Ax/z and F. For this purpose we expand x and y in the orthonormal bases 
Ui, . . . , u„ and Vi, . . . , v^, respectively: 

n m 

X = ^ tiUi and y = X] ^i^i- 

i=l 1=1 
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Then 

Ax= (B + W)x = X^MiV, + Wx, (11) 

i=l 

and, on the other hand, 

m 

Ax = = ^ 2;/iVi. (12) 

i=l 

Equating the right-hand sides of (11) and (12) we obtain 

r m 
^{zli - tiSi)Vi + Yl ^^i^i = Wx. 
i=l i=r+l 

Applying the Pythagorean Theorem 

r m 

Y^izk - Us,f + = l|Wx||2 < £^ (13) 

i=l i=r+l 

because ||x|| = 1 and ||W|| = e. 

As z > A — e holds almost surely by Theorem 9, 

m 2 2 

The order of the above estimate follows from the order of e and A of (8): 

dist^(y,F) = 0(^^) (14) 
mn 

almost surely Applying (14) for the left-hand side singular vectors yi, . . . , y,., 
by the Definition 5 

P {3moi, rioi e N such that for m > moi and n > noi : 

dist^(y,,F)<£V(A-£)2} = l 

for i = 1, . . . , r. Hence, 

P {3mo, no e N such that for m > mo and n >no: 

dist'(y,, F) < £7(A - £)^ i = 1, . . . , r} = 1, 

consequently, 

P {3mo, no e N such that for m > mo and n > no : 

^dist2(y„F)<r£V(A-£)2} = l 

i=l 



also holds, and this finishes the proof of the first statement. 

The estimate for the squared distance between G and a right-hand side sin- 
gular vector X of A follows in the same way starting with A^y — z ■ yi and 
using the fact that A^ has the same singular values as A. □ 



By Proposition 11, the individual distances between the original and the per- 
turbed subspaces and also the sum of these distances tend to zero almost 
surely as m, n — > oo. 

Now let A be a microarray on m genes and n conditions, with aij denoting 
the expression level of gene i under condition j. We suppose that A is a noisy 
random matrix obtained by adding a Wigner-noise W to the blown up matrix 
B. Let us denote by Ai, . . . , the partition of the genes and by Si, . . . , 5;, 
the partition of the conditions with respect to the blow-up (they can also be 
thought of as clusters of genes and conditions) . 

Proposition 11 also implies the well-clustering property of the representatives 

of the genes and conditions in the following representation. Let Y be the 
m X r matrix containing the left-hand side singular vectors yi, . . . , of A in 
its columns. Similarly, let X be the n x r matrix containing the right-hand 
side singular vectors Xi, . . . ,Xj. of A in its columns. Let the r-dimensional 
representatives of the genes be the row vectors of Y: y"*^, . . . , y"* £ K'', while 
the r-dimensional representatives of the conditions be the row vectors of X: 
x\ . . . ,x" G W. Let >S'^(Y) denote the a-variance, introduced in [4], of the 
genes' representatives 



av-/ ,E E lly' -yll'' ^l^ere = — E y'' 

1'-' i=l iGA'. iGA' 



{A 



while 'S'^(X) denotes the 6- variance of the conditions' representatives 

5,^(X)= min f ^||x^■-x^||^ where = ;^ E ^^ 
^ 1'-' *'^i=iies^ jeBl 

the partitions {A[, . . . , A'^} and {B[, . . . , 5^} varying over all a- and 6-partitions 
of the genes and conditions, respectively. 

Theorem 12 With the above notation, under GCl, for the a- and h-variances 
of the representation of the microarray A the relations 

Sl{Y) = Or-^) and S^,{^) = O (^) 

hold almost surely. 
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PROOF. By the proof of Theorem 3 of [4] it can be easily seen that S'^(Y) < 
Eti E,eA, I|y^-y1' and sax.) < Eti E,eB, l|x^-x^f , the right-hand sides 
being equal to the left-hand sides of (9) and (10), respectively, therefore they 
are also of order nHzH □ 



Hence, the addition of any kind of a Wigner-noise to a rectangular matrix 
that has a blown up structure B will not change the order of the protruding 
singular values, and the block structure of B can be reconstructed from the 
representatives of the row and column items of the noisy matrix A. 

With an appropriate Wigner-noise, we can achieve that the matrix B + W in 
its (i, j)-th block contains I's with probability pij, and O's otherwise. That is, 
for i = 1, . . . , a, j — 1, . . . ,b, I & Ai, k e Bj, let 



wik := < 



1 — Pij, with probability pij 

(15) 

—Pij with probability 1 — pij 



be independent random variables. This W satisfies the conditions of Defini- 
tion 1 with entries uniformly bounded by 1, zero expectation and variance 

= max (1 - Pij) < ^. 

l<i<a;l<j<b 4 

The noisy matrix A becomes a 0-1 matrix that can be regarded as the incidence 
matrix of a hypergraph on m vertices and n edges. (Vertices correspond to the 
genes and edges correspond to the conditions. The incidence relation depends 
on whether a specific gene is expressed or not under a specific condition). 

By the choice (15) of W, vertices of the vertex set appear in edges of the 
edge set Bj with probability Pij (set i of genes equally infiuences set j of condi- 
tions, like the chess-board pattern of [9]). It is a generalization of the classical 
Erdos-Rcnyi model for random hypergraphs and for several blocks, see [6]. 
The question, how such a chess-board pattern behind a random (especially 
0-1) matrix can be found under specific conditions, is discussed in Section 5. 



4 Perturbation results for correspondence matrices 



Now the pattern matrix P contains arbitrary non-negative entries, so does the 
blown up matrix B. Let us suppose that there are no identically zero rows or 
columns. We perform the correspondence transformation described below on 
B. We are interested in the order of singular values of matrix A = B + W 
when the same correspondence transformation is applied to it. To this end, 
we introduce the following notations: 
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Dficoi — diag {dBcoll, ■ ■ ■ , dBcoln) 





n 

• • • ) ^raj 










i=l 


i=l I 


1 n 


n 

. . . , ttmj 




i=i 


m 
i=l 


m \ 
• ' '^«n 1 ■ 



Further, set 

Bcorr- '■— ^Brow^^Bcol ^^'^ ^corr '■— ^ Arow^^ Acol 

for the transformed matrices obtained from B and A while carrying out cor- 
respondence analysis on B and the same correspondence transformation on 
A. It is well known [8] that the leading singular value of Bcorr is equal to 
1 and the multiplicity of 1 as a singular value coincides with the number of 
irreducible blocks in B. Let Sj denote a non-zero singular value of B^orr with 
unit-norm singular vector pair Vj, Uj. With the transformations 

^corri '■ — ^ Brow^i ^corri '■ — Bcol 

the so-called correspondence vector pairs are obtained. If the coordinates 
Vcorriio)-! Ucorri{^) of such a pair are regarded as possible values of two dis- 
crete random variables f3i and ctj (often called the ith correspondence factor 
pair) with the prescribed marginals, then, as in canonical analysis, their cor- 
relation is Si, and this is the largest possible correlation under the condition 
that they are uncorrelated with the previous random variables . . . , 
and Oil, ... , ctj-i, respectively {i > 1). 

If si = 1 is a single singular value, then Vcorri and Ucorri are the all 1 vectors 
and the corresponding (3i, ai pair is regarded as a trivial correspondence 
factor pair. This corresponds to the general case. Keeping k < rank (Bcorr) — 
rank (B) = rank (P) singular values with the coordinates of the corresponding 
A; — 1 non-trivial correspondence factor pairs, the following (A; — l)-dimensional 
representation of the jth and ith categories of the underlying two discrete 
variables is obtained: 

Vcorr — {Vcorr2{j), Vcorrk{j)) and U^^^^ := (licorr 2 (^) , ■ ■ ■ , Ucorrki^)) ■ 



This representation has the following optimality properties: the closeness of 
categories of the same variable reflects the similarity between them, while the 
closeness of categories of different variables reflects their frequent simultaneous 
occurrence. For example, B being a microarray, the representatives of similar 
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function genes, as well as representatives of similar conditions are close to each 
other; also, representatives of genes that are responsible for a given condition, 
are close to the representatives of those conditions. Now we prove the following. 

Proposition 13 Given the blown up matrix B, under GCl there exists a con- 
stant 5 e (0, 1), independent of m and n, such that all the r non-zero singular 
values ofQcorr o,re in the interval [5, 1], where r — rank{B) = rank{P). 



PROOF. It is easy to see that Bcorr is the blown up matrix of the a x b 
pattern matrix P with entries 

V i^e=i Pii^d (SLi Pkjrrik) 

Following the considerations of the proof of Proposition 8, the blown up matrix 
^carr has exactly r = rank (P) = rank (P) non-zero singular values that are 
the singular values of the axb matrix P' = D^^PD^^ with entries 

/ ^ PijVm^ ^ Pij 

Since the matrix P contains no identically zero rows or columns, the matrix 
P' varies on a compact set of a x 6 matrices determined by the inequalities 
(1). The range of the non-zero singular values depends continuously on the 
matrix that does not depend on m and n. Therefore, the minimum non-zero 
singular value does not depend on m or n. Because the largest singular value 
is 1, this finishes the proof. □ 



Theorem 14 Under GCl and GC2, there exists a positive number 5 (inde- 
pendent of m and n) such that for every < r < 1/2 the following statement 
holds almost surely: the r largest singular values of Acorr in the inter- 
val [S — max{n~'^, m~^}, 1 -|- maxln""^, m^'^}], while all the others are at most 
maxjn""^, m~'^}. 



PROOF. First notice that 

A^rr = B-^lilAB2S = B-2ilBB2if + B^HWB^if. (16) 

The entries of DBrow and those of DbcoI are of order 0(n) and 0(m), respec- 
tively. Now we prove that for every i = 1, . . . ,m and j = 1, . . . ,n \dArowi — 
dsrowil < ■ '^"^ \dAcoij ~ dBcoijl < ' i^old almost surely. To this 
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end, we use Chernoff's inequality for large deviations (cf. [5], Lemma 4.2): 



P (^\dArowi d 

< exp I 
= exp<^ 



Browil ^ • Tl 

^2-2r 



) = P 



> n 



l-T 



n 



< exp 



n 



2-2t 



,1-2t 



2(ncT2 + Kni-V3) 



2(a2 + irn-V3) 



i = l,...,m), 



where the constant K is the uniform bound for |wjj|'s and is the bound 
for their variances. In virtue of GC2 the following estimate holds with some 
Co > and C > 1 (constants of GC2) and large enough n: 

P i^dArowi - dBrawi\ > n^"^ for all i e {1,..., m}) 
< m • exp < —777^-5 . , > < Co ■ n ■ exp 



2((72 + Xn-V3) 



2((72 + Xn-V3) 



exp < In Co + C In n 



n 



l-2r 



2((72 + Xn-73) j ■ 



(17) 



The estimation of probabihty 

P (jdAcoij - dscoijl > ^^'"^ all j e {1, . . . , n}) 

can be treated analogously (with Dq > and D > 1 of GC2). The right-hand 
side of (17) forms a convergent series, therefore 

min \dArowi\=Q{'ri), min \dAcoij\ = ^{m) (18) 

ie{l,...,m} je{l,...,n} 

hold almost surely. 

Now it is straightforward to bound the norm of the second term of (16) by 

110^/^11 •||W||-||Di/f||. (19) 



As by Lemma 7, ||W|| = 0{\/m + n) holds almost surely, the quantity (19) 
is at most of order almost surely. Hence, it is almost surely less than 

max{n~'^, m""^}. 



To estimate the norm of the first term of (16) let us write it in the form 



^Arow^^Aool 



^Brow^^Bcol ^ 



+ D 



-1/2 

' Arow 

-1/2 -r. 
Arow 



D 



-1/2 

'Brow 

-1/2 
Acol 



BD 



-1/2 

Bcol 



D 



-1/2' 



(20) 
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The first term is just Bcorr, so due to Proposition 13, we should prove only that 
the norms of both remainder terms are almost surely less than max{n^^, m^^}. 
These two terms have a similar appearance, therefore it is enough to estimate 
one of them. For example, the second term can be bounded by 

\\^2il-^Blil\\-\m-\\^Bcil\\- (21) 

The estimation of the first factor in (21) is as follows: 



1 1 



ie{l,...,m} \y/dArowi Vd 



Browi. 



\dArowi dBrowi] /'00^ 

= max — . ~ — : ^ . — [zz 1 

ie{l,...,m} y d^rowi ' dBrowiiv dArowi + V dBrowi) 

I dArow i dBrow i \ 1 

< max — , ■ max 



ie{l,...,m} dArowi ■ dsrowi *e{l,...,m} (^^^4^" + a/3b^^) 

By relations (18), \/ dArowi ■ dBrowi — Q{n) for any i = 1, . . . , m, and hence, 

dArow i dBrow i , — t 
V ^Arow i ' (^Brow i 

almost surely, further maxjg^i ... „j} — '+./d^ — ' ~ ®(^) ^^^^^^ surely. 

Therefore the left hand side of (22) can be estimated by n""^"^/^ from above 
almost surely. For the further factors in (21) we obtain ||B|| = Q{^mn) (see 
Proposition 8), while ||Dg^^f || = almost surely. These together imply 

that 

_ j^i/2j^i/2 . j^-1/2 < j^-r < max{n~'^, m~'^}. 

This finishes the estimation of the first term in (16), and by he Weyl's pertur- 
bation theorem the proof, too. □ 



Remark 15 In the Gaussian case the large deviation principle can he replaced 
by the simple estimation of the Gaussian probabilities with any k > 0: 



P 



n 



E 



w. 



> ft; < min 1, 



4(7 



V2 



exp 



nn 




Setting k 



n 



we get an estimate, analogous to (17). 



Suppose that the blown up matrix B is irreducible and its non-negative entries 
sum up to 1. This restriction does not effect the result of the correspondence 
analysis, that is the SVD of the matrix "Qcorr- Remember that the non-zero 
singular values of ^corr are the numbers 1 = si > S2 > ■ ■ ■ > Sr > Q with 
unit-norm singular vector pairs Vj, Uj having piecewise constant structure 
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(i = l,...,r). Set 

F := Span {vi, . . . , Vr} and G := Span {ui, . . . , u^}. 



Let < r < 1/2 be arbitrary and e := maxln""^, m'^}. Let us also denote 
the unit-norm, pairwise orthogonal left- and right-hand side singular vectors 
corresponding to the r singular values zi, . . . ,Zr € [S — e,l + e] of Acorr ~ 
guaranteed by Theorem 14 under GC2 - by yi, . . . , G and Xi, . . . , G 
R", respectively. 

Proposition 16 With the above notation, under GCl and GC2 the following 
estimate holds almost surely for the distance between y^ and F: 

c^^<y^,F)<^^ = ^X^ (i = l,...,r) (23) 
and analogously, for the distance between Xj and G: 

dist{^,,G)<-^ = -^ (i = l,...,r). (24) 



PROOF. Follow the method of proving Proposition 11 - under GCl - with 
5 instead of A and e instead of e. Here GG2 is necessary only for Acorr to have 
r protruding singular values. □ 

Remark 17 The left-hand sides of (23) and (24) are almost surely of order 
maxln""^, m""^} that tend to zero as m,n —>■ oo under GGl and GG2. 

Proposition 16 implies the well-clustering property of the representatives of 
the two discrete variables by means of the noisy correspondence vector pairs 

^ T-v — 1/2 ^ -p-v — 1/2 / • I \ 

Ycorri • Arovjyii ^corri ■ '-'Acol \^ ^J- 

Let Ycorr denote the m x r matrix that contains the left-hand side vectors 
ycorri, • • • ,ycorrr in its columus. Similarly, let ^corr denote the n x r matrix 
that contains the right-hand side vectors Xcorri, • • • , ^corrr in its columns. The 
r-dimensional representatives of a are the row vectors of Y^o^r denoted by 
ylorr^ ■ ■ ■ 1 Ycorr ^ '^'^ ' while the r-dimensional representatives of /3 are the row 
vectors of Xcorr denoted by • • • , ^corr ^ With respect to the marginal 
distributions, let the a- and 6-variances of these representatives be defined by 

a 

Sl{Ycorr) = min V dArowjWy^c^r " Ycorr 11^ > 
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''corr corr 1 1 ' 



b 

^^i^-'^bi i=l jeB'^ 

where {^4^, . . . , A'^} and {B[, . . . , S^} are a- and ^-partitions of the genes and 
conditions, respectively, 



ylorr — dArowjYiorr K:orr — X] ^Acolj^] 

Theorem 18 With the above notation, under GCl and GC2, 

12 T 2 



■3 

corr ■ 



-ly "ov'-c;"''/ - j-d _ 

hold almost surely, where e — max{n~'^, m""^} with every < r < 1/2. 
PROOF. An easy calculation shows that 

a r 

SliY^rr) < E E dArowihiorr " florrf = ^^S^ {y i , F) , 

i=l jeAi 1=1 

b r 
S^{Xcorr) < E E C^^cooll^iorr " KorrW^ = E ^^^^^(^i' 

1=1 jeBi 1=1 
hence the result of Proposition 16 can be used. □ 



Under GCl and GC2 with m,n large enough. Theorem 18 implies that after 
performing correspondence analysis on the noisy matrix A, the representation 
through the correspondence vectors belonging to Acorr will also reveal the 
block structure behind A. 



5 Recognizing the structure 



One might wonder where the singular values of an m x n matrix A = (fly) are 
located if a := maxjj- \ aij\ is independent of m and n. On one hand, the maxi- 
mum singular value cannot exceed 0{^mn), as it is at most ^JYIILi YJj=i (^ij- 
On the other hand, let Q be an m x n random matrix with entries a or —a 
(independently of each other). Consider the spectral norm of all such matrices 
and take the minimum of them: minQg{_a _|_a}mxn ||Q||. This quantity measures 
the minimum linear structure that a matrix of the same size and magnitude 
as A can possess. As the Frobenius norm of Q is a^/mn, in virtue of inequal- 
ities between spectral and Probenius norms, the above minimum is at least 
-ys^/m + n, which is exactly the order of the spectral norm of a Wigner-noise. 
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So an m X n random matrix (whose entries are independent and uniformly 
bounded) under very general conditions has at least one singular value of 
order greater than \/m + n. Suppose there are k such singular values and the 
representatives by means of the corresponding singular vector pairs can be well 
classified in the sense of Theorem 12 (cf. the introduction to that theorem). 
Under these conditions we can reconstruct a blown up structure behind our 
matrix. 

Theorem 19 Let Amxn be a sequence ofmxn matrices, where m andn tend 
to infinity. Assume, that A^^^ has exactly k singular values of order greater 

than \/m\ n (k is fixed). If there are integers a > k and b > k such that 
the a- and b-variances of the row- and column-representatives are 0{^^^^^), 
then there is a blown up matrix B^xn such that A^xn = ^mxn + ^mxn, 'with 
||E,„xn|| = 0{y/m + n). 



PROOF. The proof gives an explicit construction for B^xn- In the sequel the 
subscripts m and n will be dropped. We shall speak in terms of microarrays 
(genes and conditions). 

Let yi, . . . , y/c e M"* and Xi, . . . , x^. e denote the left- and right-hand side 
unit-norm singular vectors corresponding to Zi, . . . ,Zk, the singular values of 
A of order larger than ^/m + n. The /c-dimensional representatives of the genes 
and conditions - that are row vectors of the m x k matrix Y = (yi, . . . , y^) 
and those of the n x k matrix X = (xi, . . . ,Xfe), respectively - by the con- 
dition of the theorem form a and b clusters in M*^, respectively with sum of 
inner variances Q(in±!i\ Reorder the rows and columns of A according to 
the clusters. Denote by y^, . . . , y™ G and x^, . . . , x" G M'^ the Euclidean 
representatives of the genes and conditions (the rows of the reordered Y and 
X), and let y^, . . . , y" G R'^ and x\ . . . , x** G R'^ denote the cluster centers, 
respectively. Now let us choose the following new representation of the genes 
and conditions. The genes' representatives be row vectors of the mxk matrix 
Y such that the first mi rows of Y be equal to y-*^, the next rows to y^, 
and so on, the last rua rows of Y be equal to y"; similarly, the conditions' 
representatives be row vectors of the n x k matrix X such that the first ni 
rows of X be equal to x^, and so on, the last ui, rows of X be equal to x^. 

By the considerations of Theorem 12 and the assumption for the clusters. 



k 



m + n 



^dist2(y,,F) 



mn 



) 



(25) 



i=l 



and 



k 



m + n 



^dist2(x,,G') 



mn 



) 



(26) 



1=1 
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hold respectively, where the /c-dimensional subspace F C is spanned by 
the column vectors of Y, while the A;-dimensional subspace G C is spanned 
by the column vectors of X. We follow the construction given in [4] (see Propo- 
sition 2) of a set vi, ... , of orthonormal vectors within F and another set 
Ui , . . . , Ufe of orthonormal vectors within G such that 



k 



^l|y,-v,||^= minXlly^-v:ir<2Edist^(y.^) (27) 



and 



^||xi-Uif = min^^||xi-u;f <2^dist'(xi,G) (28) 

hold, where the minimum is taken over orthonormal sets of vectors v'^ , . . . , e 
F and u'^, . . . , uj^. e G, respectively. The construction of the vectors vi, . . . , v;^. 
is as follows (ui, . . . , can be constructed in the same way). Let v'^^, . . . , v^, G 
F an arbitrary orthonormal system (obtained e.g., by the Schmidt orthogo- 
nalization method). Let V = (v'^, . . . , v^) be m x A; matrix and 

Y^V = QSZ^ 

be SVD, where the matrix S contains the singular values of the k x k matrix 
Y^V in its main diagonal and zeros otherwise, while Q and Z are k x k 
orthogonal matrices (containing the corresponding unit norm singular vector 
pairs in their columns). The orthogonal matrix R = ZQ^ will give the conve- 
nient orthogonal rotation of the vectors v'^^, . . . , v'^. That is, the column vectors 
of the matrix V = V'R form also an orthonormal set that is the desired set 

Vi, . . . , Vfc. 

Define the error terms and q^, respectively: 

= Yi - Vi and = Xj - (i = 1, . . . , A;). 

In view of (25) -(28), 

j:\\v,f^0C^) and El|q.|P = 0(^^). (29) 
~{ mn ~[ mn 



Consider the following decomposition: 



min{m,n} 
i=l i=k+l 



The spectral norm of the second term is at most of order -\- n. Now con- 
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sider the first term, 

k k 



(30) 



i=l i=l 

k 



i=l i=l 1=1 i=l 



Since vi, . . . , and Ui, . . . , are unit vectors, the last three terms in (30) 
can be estimated by means of the relations 



(i = 

Ikill {i = l,...,k). 

Taking into account that Zi cannot exceed Q{^/m,n) and k is fixed, due to 
(29) we get that the spectral norms of the last three terms in (30) - for their 
finitely many subterms the triangle inequality is applicable - are at most of 
order \/m + n. Let B be the first term, i.e., 

k 

^ = Y ^iVjuf , 
1=1 

then ||A - B|| = 0{y/m + n). 

By definition, the vectors Vi, . . . , and the vectors Ui, . . . , are in the 
subspaces F and G, respectively. Both spaces consist of piecewise constant 
vectors, thus the matrix B is a blown up matrix containing a x b blocks. The 
'noise' matrix is 

k k k mm{in,n} 

B = Yz^ViClJ + Yzir.n[ + Yz^r^clJ+ Y ^iYi^ 

i=l 1=1 i=l i=k+l 

that finishes the proof. □ 



I Til 

Wi'^i II = 



T Tl 



T Tl 



I T T I 



I T Tl 



= 1 

= llq^l 

= Ikill 
= llq^ll 



Then, provided the conditions of Theorem 19 hold, by the construction given 
in the proof above, an algorithm can be written that uses several SVD's and 
produces the blown up matrix B. This B can be regarded as the best blown 
up approximation of the microarray A. At the same time clusters of the genes 
and conditions are also obtained. More precisely, first we conclude the clusters 
from the SVD of A, rearrange the rows and columns of A accordingly, and 
after we use the above construction. If we decide to perform correspondence 
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analysis on A then by (16) and (20), ^corr will give a good approximation to 
Acorr and similarly, the correspondence vectors obtained by the SVD of Bcorr- 
will give representatives of the genes and conditions. 

To obtain SVD of large matrices, randomized algorithms are at our disposal, 
e.g., [1]. There is nothing to loose when applying these algorithms because they 
give the required results only if our matrix had a primary linear structure. 
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